--- title: first snkrfinder.model.cvae keywords: fastai sidebar: home_sidebar nb_path: "nbs/02c_model.cvae.ipynb" ---
TODO:clean up this preamble section preamble: This is a project initiated while an Insight Data Science fellow. It grew out of my interest in making data driven tools in the fashion/retail space I had most recently been working. The original over-scoped idea was to make a shoe desighn tool which could quickly develop some initial sneakers based on choosing some examples, and some text descriptors. Designs are constrained by the "latent space" defined (discovered?) by a database of shoe images. However, given the 3 week sprint allowed for development, I pared the tool down to a simple "aesthetic" recommender for sneakers, using the same idea of utilizing an embedding space defined by the database fo shoe images.
Most of the code is derived from the work of @EtieeneT. (e.g.: TabularData https://github.com/EtienneT/TabularVAE and later https://github.com/EtienneT/vae )
Auto Encoder: AE
convolutional encoder / decoder
encoders
decoders
LatentLayer)Variational- Auto Encoder: VAE or 𝜷-VAE
- model based data cleaning (widget module?)
- GAN finetuning?
- crappify general pattern
- throw out based on inspection of high loss
- try mixed labels? things that are >50% sneakers included???
Load the saved merged database, and set the seeds. And doublecheck our data is where we expect.
df = prep_df_for_datablocks(df)
Don't forget to set n_inp=1. Otherwise the default to make the input to 1-len(blocks). Also note that the FeatsResize is used to avoid the random jittering from resize during training. Only the very narrow batch augmentations will be used.
Variational Auto-Encoder for fastai
I'm going to use a generic convolutional net as the basis of the encoder, and its reverse as the decoder. This is a proof of concept for using the fastai framework, and will experiment with pre-trained resnet and MobileNet_v2 later. I'd like to use the MobileNet_v2 as a direct link ot the "SneakerFinder" tool which motivated this experiment. [see SneakerFinder]
A variational "module" will sit between the encoder and decoder as the "Bottleneck". The Bottleneck will map the resnet features into a latent space (e.g. ~100 dimensions) represented of standard normal variables. The "reparameterization trick" will sample from this space and the "decoder" will generate images.
Finally a simple "decoder" will sample from the variational latents space and be trained to reconstruct the images.
The intention is the latent space can be used to generate novel sneaker images.
Although we give up the original utility we are going for -- creating new sneakers via the latent space -- having otherwise equivalent non-variational autoencoders for reference will be great. Furthermore, this latent space representation will be amenable to a MMD regularization later on. This will be useful to avoid some of the limitiations of the KLD as a regularizer (overestimation of variance, and some degenerate convergences). Its sort of hack-ey but keeping the tooling equivalent to the betaVAE will ultimatel give some advantages.
It is convenient to avoid the class wrappers to simplify the param splitting for freezing the pre-trained arcitecture.
We could enumerate the class layers and return sequential, but simply making some functions to put the layers togeter is better.
These inherit from Module. (fastai's wrapper on nn.Module). The base class AELoss initializes wiht a batchmean, alpha, and useL1 parameters to set how the loss will be aggregated and regularized. For the basice AutoEncoder we'll regularize the latent with a L1 to keep the magnitudes from exploding.
{% include note.html content='Here batchmean means we will divide the loss (either L1 or L2 depending on useL1 flag) by which uses reduction=’sum’ by the batch size. This technically makes it a cost computed for each batch. This same convention will be used later for KL-Divergence and MMD latent regularizers.' %}
These are some helpers for computing the KL Divergence as a function and a Module
The fastai Learner class does the training loop. It took me a little digging into the code to figure out how Metrics are called since its not really stated anywhere in the documentation (Note: create PR for fastai for extra documentation on Metrics logic). By default one of the key Callbacks is the Recorder. It prints out the training summary at each epoch (via ProgressCallBack) and collects all the Metrics. Which by default only loss is a train_met and others are valid_met.
The Recorder resets (maps reset() to all mets) the metrics before_train and before_valid. The Recorder maps accumulate() to the metrics on after_batch. Finally
AnnealedLossCallback will inject the latent mu and logvar and a kl_weight variable into our loss. The mu and logvar will be used to compute the KLD. The kl_weight is a scheduled weighting for the KLD. You can see the schedule graph of the parameter. At the beginning it will be 0, thus the KLD part of the loss will get ignored. So during 10% of training, we will fit a normal auto-encoder. Then gradually for 30% of trainning, increase kl_weight to 1 and then remain there for the remaining training time so that the auto encoder now becomes full variational. The way this callback is done, the loss will receive this parameter, but not the model.
n_epochs = 10
f_init = combine_scheds([.1, .7, .2], [SchedNo(0,0),SchedCos(0,1), SchedNo(1,1)])
# f = combine_scheds([.8, .2], [SchedCos(0,0), SchedCos(0,.5)])
p = torch.linspace(0.,1.,100)
pp = torch.linspace(0.,1.*n_epochs,100)
plt.plot(pp,[f_init(o) for o in p])
{% include warning.html content='Avoid using early stopping because the AnnealedLossCallback will make the loss grow once the KL divergence weight kicks in. ' %}
I want to note something here that was a little confusing to me: params(model) is a builtin fastai PyTorch.core function which returns all of the parameters of the modules. i.e.
def params(m):
"Return all parameters of `m`"
return [p for p in m.parameters()]
The toplevel fastai core functions with simple names that almost match class attributes was one of my biggest stumbling blocks in getting acquainted with the fastai v2 API. (The other is the documentation which is autogenerated by the fastdev frameworks from their development noteboooks. More on that struggle and my tips if that is troblesome for you later (here).
{% include note.html content='that it is crucial that you don’t freeze the batch norm layers. The bn_splitter collects out all the batchnorm layers. The simple splitting we do only freezes the encoder and leaves the latent layers (i.e. VAE or linear encoding bottlenedck) and the decoder in a parameter group with the batchnorm layers.' %}
Splitters {% include warning.html content='there are two completely different splitters in the FastAI API. This splitter groups the model parameters into groups for freezing and for progressive learning rates. (The other one is splits data into train and validate. I got imminiently confused when I first started with the API by this.' %}
TODO:more sophisticated parameter splitting to enable progressive learning rates
MobileNet_v2 as the encoder, as a continuation of the original Sneaker Finder
simple bowtie convolutional encoder / decoder (Mimics the GOAT medium blog)
- Architecture Hyperparameters:- Latent Size (research default 256, production default 32) - Filter Factor Size (research default 16, production default 32)
- Latent Linear Hidden Layer Size (research default 2048, production default 1024)
- The encoder architecture is as follows with research defaults from above:
- Input 3x128x128 (conv2d block [conv2d, batchnorm2d, relu])
- 16x64x64 (conv2d block [conv2d, batchnorm2d, relu])
- 32x32x32 (conv2d block [conv2d, batchnorm2d, relu])
- 64x16x16 (conv2d block [conv2d, batchnorm2d, relu])
- 128x8x8 (conv2d block [conv2d, batchnorm2d, relu])
- Flatten to 8192
- 2048 (linear block [linear, batchnorm1d, relu])
- Split the 2048 dimension into mu and log variance for the parameters of the latent distribution
- Latent mu size 256 (linear layer only with bias)
- Latent logvar size 256 (linear layer only with bias)
- In the middle here you can break out the BCE and KLD loss for the final loss term and use the standard reparam trick to sample from the latent distribution.
- Decoder architecture an exact mirror
- Input 256
- 2048 (linear block [linear, relu])
- 8192 (linear block [linear, batchnorm1d, relu])
- reshape (128x8x8)
- 64x16x16 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 32x32x32 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 16x64x64 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 3x128x128 (conv2d transpose [convtranspose2d, sigmoid]
- For weight initialization I used a normal distribution centered at zero with 0.02 set for the stddev. Optimizer: Adam with default parameters, if I were to do it over again I'd spend more time here understanding the learning dynamics. The dataset was about ~10,000 with a 70/20/10 split, batch size 64, over 120 epochs, with a learning schedule to reduce when the loss started to plateau. No crazy image augmentation just resizing and standards flips. I used the ANN package Annoy to do the NN search for prod, normalizing the embeddings and using the cosine similarity, ANN factor was 128 for num_trees.
- MMD regularized VAE where the latents are drawn from a
TODO:Ranger optimizer might really help .. test
We can also use the transfer learning VAE tooling we previously built. We just need to create the convolutional encoder and pass it in... Note that we don't have a pre-trained option, so DON'T FREEZE!
Now just wrap that simple conv block architecture into a builder. And a meta-wrapper to let us call the conv_encoder and pre-trained options with the same function. (I'll also put the get_pretrained_parts function here now even though we won't use it till the next section, so that we can make the get_encoder_parts generic wrapper handle both properly.)
##
latent_dim = 128
# equalize KLDiv wrt errors per pixel
alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha /= 20 # 5% retularizer
batchmean = True
useL1 = False
hidden_dim = None
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_AE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 64
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = AE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = AELoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
{% include note.html content='The to_fp16() callbacks work but increasing the batch size doesn’t really speed things up.' %}
latent_dim = 128
# equalize KLDiv wrt errors per pixel
alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha /= 20 # 5% retularizer
batchmean = True
useL1 = False
hidden_dim = None
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_AE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 64
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = AE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = AELoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn = learn.to_fp16()
For several of the decoder and "sampler" layers I might want to turn off the nonlinearity to give us more reasonable "gaussian" outputs to the Variational layer and the generated image which will is compared with the ImageNetStats batch-normalized image.
IMPORTANT VAE TIP!!! Make sure NOT to use batch normalization and non-linearity in the linear layers of the VAE. The normalization will affect the representation and the KLD constraints.
This creates a pair of latents from which we can perform the "resample" trick.
{% include note.html content='this is a $\beta$-VAE (hence BVAE because we have a weighting factor for the KL Divergence regularazation factor which acts as a Legrangian).' %}
Putting it all together gives us our VAE! Note that we'll pass in the "parts" of the encoder for ease of using pretrained (or not) architectures. The model name will correspond to the architecture of the encoder via name.
Note that the BVAE can simply inherit from the AE class we defined above. Really the only difference in the __init__ function is that a VAELayer which performs the reparameterization trick replaces the AElayer as self.bn
A nice wrapper for building the encoder parts will be handy.
Sweet, we've verified the arcitecture works, but we need to train it with a loss that constrains the variational layers with the KL Divergence. Otherwise the simple MSE will diverge.
This simply adds a KLD regularizer to the latent space defined by two rank-1 tensors defining gaussian-prior latents. i.e. a mean ($\mu$) and standard deviation ($\sigma$).
{% include note.html content='for convenience and numeric stability the $\sigma$ is representated as a $\log(\sigma^{s})$ so the tensors are called mu and logvar' %}
Here's how we put everything together.
latent_dim = 128
alpha = 5
batchmean = True
useL1 = False
hidden_dim = None
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
We can use this callback if we want to save the model at every epoch. Which could be super useful if we were able to actually overfit our model.
'SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True)'
# NOTE: lf_finder does NOT work correctly with our annealed kl_weight...
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
{% include note.html content='The CallBacks need to be updated for the KL loss annealing schedule as we tune. We want to turn the kl_weight to 1.0 for instance when using the learning rate finder learn.lr_find(), and after an initial burn in where the KL_loss term gradually ramps in, setting the kl_weight to stay at unity will be useful as we separately turn the learning rate (e.g. fit_one_cycle or fit_flat_cosine.' %}
#put in the annealied KL_weight...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
n_epochs = 5
#learn.fit_one_cycle(n_epochs,lr_max= lr1)
learn.fit_one_cycle(n_epochs)
learn.show_results()
The vae with pretrained resnet encoder seems to train to a much better end-point if we keep the resnet frozen. Hence the commented out learn.unfreeze() below.
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
epochs = 20
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
learn.fit_flat_cos(epochs,pct_start=.05)
learn.show_results()
latent_dim = 128
# equalize KLDiv wrt errors per pixel
# alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha = 5
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': default_KL_anneal_in() })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = BVAE(get_encoder_parts(arch), hidden_dim=None,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
The MMD replace latent regularization term in loss_fn (KLD) with Maximal Meanm Discrepancy.
We'll make an MMDVAE class to keep things declarative, but its really just an AE. i.e. a linear latent layer.
Additional background on MME from [https://github.com/Saswatm123/MMD-VAE]:
Maximum Mean Discrepancy Variational Autoencoder, a member of the InfoVAE family that maximizes Mutual Information between the Isotropic Gaussian Prior (as the latent space) and the Data Distribution.\ Short explanation:The traditional VAE is known as the ELBO-VAE, named after the Evidence Lower Bound used in its objective. The ELBO suffers from two problems: overestimation of latent variance, and uninformative latent information.\The latter is because one of the objective's terms is the KL-Divergence between the Gaussian parameterized by the encoder and the Standard Isotropic Gaussian. This dissuades usage of the latent code, so that the KL-Divergence term is allowed to fall even further. It is important to note that the KL-Divergence should never truly reach zero, as that means the encoder is not learning useful features and cannot find feature locality, and the decoder is just randomly sampling from Standard Gaussian noise.\ The overestimation of variance results from the KL-Divergence term not being strong enough to balance against the Reconstruction Error, and thus the Encoder prefers to learn a multimodal latent distribution with spread apart means, leading to low training error as it overfits, but low quality samples as well, as the sampling distribution is assumed to be a Standard Isotropic Gaussian. One effort to mitigate this effect is the Disentangled Variational Autoencoder, which simply raises the weight on the KL-Divergence term. However, this increases the problem stated in the paragraph above since it further penalizes using the latent code.\ For more detailed explanations, I used these resources to learn, in order of usefulness to me:\
- https://arxiv.org/pdf/1706.02262.pdf \ - http://ruishu.io/2018/03/14/vae/ \ - http://approximateinference.org/accepted/HoffmanJohnson2016.pdf \ - https://ermongroup.github.io/blog/a-tutorial-on-mmd-variational-autoencoders/ \ - http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/ \ - https://ermongroup.github.io/cs228-notes/inference/variational/ \
Simply call our get_encoder_parts with arch='vanilla' in the MMDVAE builder.
latent_dim = 128
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 20
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
We build these by passing resnet18 (not 'resnet18') to get_encoder_parts helper for the parts to init the MMDVAE.
First a traditional "fine_tune" type of training:
latent_dim = 128
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.freeze()
learn.freeze()
n_epoch = 20
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
learn.fit_one_cycle(n_epoch)#,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
n_epoch = 20
learn.unfreeze()
#learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e6, pct_start=0.7)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e5, pct_start=0.5)
learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
We build these by passing resnet18 (not 'resnet18') to get_encoder_parts helper for the parts to init the MMDVAE.
{% include note.html content='our architecture trains best when simply starting with the pretrained weights. Trying to "fine_tune" by training the backend on a frozen resnet and then unfreezing doesn’t work with the parameter groupings from the AE_split. The learn.unfreeze() doesn’t actually do anythign (unfrozen is default) but makes is clear we are un -frozen' %}
latent_dim = 128
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.unfreeze()
Finally we can use the ResBlockBVAE which instantiates a ResBlock decoder to optimize the architecture. This is following the fastAI API lessong from the "bag of tricks" ResNet paper (CITATION), which is a general true-ism which could be glibly states as: "replacing a Conv with a ResBlock always gets you better results". The Class constructor ResBlockAE takes isVAE to switch between AELayer and LatentLayer latents.
TODO:update AE class to make a VAE or AE based on the
isVAEswitch
latent_dim = 128
alpha = 5 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE,isVAE= True)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 20
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 64
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 64
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
class ResBlockBVAE(BVAE):
"""
simple VAE with a _probably_ pretrained encoder
"""
def __init__(self,enc_parts,hidden_dim=None, latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE):
"""
inputs:
enc_arch (pre-cut / pretrained)
enc_dim
latent_dim
hidden_dim
im_size,out_range
"""
enc_arch,enc_feats,name = enc_parts
# encoder
# arch,cut = xresnet18(pretrained=True),-4
# enc_arch = list(arch.children())[:cut]
BASE = im_size//2**5
enc_dim = enc_feats * BASE**2 # 2**(3*3) * (im_size//32)**2 #(output of resneet) #12800
self.encoder = build_AE_encoder(enc_arch,enc_dim=enc_dim, hidden_dim=hidden_dim, im_size=im_size)
in_dim = enc_dim if hidden_dim is None else hidden_dim
# VAE Bottleneck
self.bn = VAELayer(in_dim,latent_dim)
#decoder
self.decoder = build_ResBlockAE_decoder(hidden_dim=hidden_dim, latent_dim=latent_dim, im_size=im_size,out_range=out_range)
store_attr('name,enc_dim, in_dim,hidden_dim,latent_dim,im_size,out_range') # do i need all these?
# THESE ARE INHERITED..
# def decode(self, z):
# z = self.decoder(z)
# return z
# def reparam(self, h):
# return self.bn(h)
# def encode(self, x):
# h = self.encoder(x)
# z, mu, logvar = self.reparam(h)
# return z, mu, logvar
# def forward(self, x):
# z, mu, logvar = self.encode(x)
# x_hat = self.decode(z)
# latents = torch.stack([mu,logvar],dim=-1)
# return x_hat, latents # assume dims are [batch,latent_dim,concat_dim]
latent_dim = 128
alpha = 5 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# the defaults are pretty good for now
n_epochs = 10
#learn.fit_one_cycle(freeze_epochs1,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs) #, lr=1e-4,pct_start=0.5)
learn.fit_one_cycle(n_epochs)#, lr_max= base_lr)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final=1000)#,lr=1e-4)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
x = 1
latent_dim = 128
alpha = 10 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# # the defaults are pretty good for now
n_epochs = 10
learn.fit_one_cycle(n_epochs)#, lr_max= base_lr)
# #learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
# learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
# learn.show_results()
# # the defaults are pretty good for now
# n_epochs = 10
# learn.fit_one_cycle(10)#, lr_max= base_lr)
# #learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
# #learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
# learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final = 1000)#,lr=1e-4)
learn.show_results()
learn.lr
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# learn.save(filename)
# learn.export(f'{filename}.pkl')
# base_lr = 1e-5# gmlr #/= 2
# epochs = 50
# #learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
# #learn.fit_one_cycle(epochs, lr_max= 1e-3)
# #learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
# learn.fit_flat_cos(epochs)
# learn.show_results()
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# learn.save(filename)
# learn.export(f'{filename}.pkl')
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# filename = "BVae-POST-1CYCLE10-latent128-resblock-alpha10_2021-03-24_21.33.02"
# learn.load(filename)
# #epochs = 5
# epochs = 10
# learn.fit_one_cycle(epochs, lr_max=.001)
# #learn.fit_flat_cos(epochs,lr=.0015,pct_start=.5,div_final=1000.0)
# #learn.fit_one_cycle(epochs,lr_max=5e-3,pct_start=0.5,div_final=100000) # gets down to ~4500 loss in 10
# learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 64
alpha = 5 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
n_epochs = 10
learn.fit_one_cycle(n_epochs)#,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
#learn.fit_flat_cos(epochs,lr=1e-4)
learn.fit_flat_cos(epochs, div_final=1000.0)#,lr=1e-3)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 64
alpha = 10 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# the defaults are pretty good for now
n_epochs = 10
learn.fit_one_cycle(n_epochs)#,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final=1000.)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
class LatentTuple(fastuple):
"Basic type for tuple of tensor (vectors)"
_show_args = dict(s=10, marker='.', c='r')
@classmethod
def create(cls, ts):
if isinstance(ts,tuple):
mu,logvar = ts
elif ts is None:
mu,logvar = None,None
else:
mu = None
logvar = None
if mu is None: mu = torch.empty(0)
elif not isinstance(mu, Tensor): Tensor(mu)
if logvar is None: logvar = torch.empty(0)
elif not isinstance(logvar,Tensor): Tensor(logvar)
return cls( (mu,logvar) )
def show(self, ctx=None, **kwargs):
mu,logvar = self
if not isinstance(mu, Tensor) or not isinstance(logvar,Tensor): return ctx
title_str = f"mu-> {mu.mean():e}, {mu.std():e} logvar->{logvar.mean():e}, {logvar.std():e}"
if 'figsize' in kwargs: del kwargs['figsize']
if 'title' in kwargs: kwargs['title']=title_str
if ctx is None:
_,axs = plt.subplots(1,2, figsize=(12,6))
x=torch.linspace(0,1,mu[0].shape[0])
axs[0].scatter(x, mu[:], **{**self._show_args, **kwargs})
axs[1].scatter(x, logvar[:], **{**self._show_args, **kwargs})
ctx = axs[1]
ctx.scatter(mu[:], logvar[:], **{**self._show_args, **kwargs})
return ctx
# could we do a typedispatch to manage the transforms...?
def VAETargetTupleBlock():
return TransformBlock(type_tfms=VAETargetTuple.create, batch_tfms=IntToFloatTensor)
def LatentTupleBlock():
return TransformBlock(type_tfms=LatentTuple.create, batch_tfms=noop)
# class TensorPoint(TensorBase):
# "Basic type for points in an image"
# _show_args = dict(s=10, marker='.', c='r')
# @classmethod
# def create(cls, t, img_size=None)->None:
# "Convert an array or a list of points `t` to a `Tensor`"
# return cls(tensor(t).view(-1, 2).float(), img_size=img_size)
# def show(self, ctx=None, **kwargs):
# if 'figsize' in kwargs: del kwargs['figsize']
# x = self.view(-1,2)
# ctx.scatter(x[:, 0], x[:, 1], **{**self._show_args, **kwargs})
# return ctx
latent_dim = 128
dropout = .2
im_size = IMG_SIZE
n_blocks = 5
nfs = [3] + [2**i*n_blocks for i in range(n_blocks+1)]
nfs.reverse()
# decoder = nn.Sequential(
# nn.Linear(latent_size, 16),
# UnFlatten(4),
# ResBlock(1, 3, 4, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, 4, 8, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, 8, 16, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# #nn.AdaptiveAvgPool2d((3,im_size, im_size))
# )
n_blocks = 5
hidden_dim = 2048
out_range = [-1,1]
tst = nn.Sequential(
nn.Linear(latent_dim,hidden_dim), #nn.Linear(latent_dim, 16)
nn.Linear(hidden_dim,im_size*n_blocks*n_blocks), #nn.Linear(latent_dim, 16)
ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[4], nfs[5], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[5], nfs[6], act_cls=Mish), #nn.Conv2d(16, 1, 3),
#nn.Dropout2d(dropout),
#nn.Upsample(scale_factor=2), #
#nn.AdaptiveAvgPool2d((3,im_size, im_size)),
SigmoidRange(*out_range), #nn.Sigmoid()
)
tst,nfs
inp= torch.randn((32,latent_dim))
#last_size = model_sizes(tst ) #[-1][1]
#num_features_model(tst)
#last_size
#nfs
tst(inp).shape,nfs
#last_size
z_dim = 100
enc = nn.Sequential(
ResBlock(1, 1, 16, act_cls=nn.ReLU, norm_type=None),
nn.MaxPool2d(2, 2),
ResBlock(1, 16, 4, act_cls=nn.ReLU, norm_type=None),
nn.MaxPool2d(2, 2),
Flatten()
)
# torch.Size([32, 1, 28, 28])
# torch.Size([32, 16, 28, 28])
# torch.Size([32, 16, 14, 14])
# torch.Size([32, 4, 14, 14])
# torch.Size([32, 4, 7, 7])
# torch.Size([32, 196])
latent_size = 100
enc = nn.Sequential(
ResBlock(1, 3, 5, stride=2, act_cls=Mish),# 1->3
ResBlock(1, 5, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 1, stride=2, act_cls=Mish),
Flatten(),
nn.Linear(400, latent_size) # 16->400
)
# torch.Size([32, 1, 28, 28])
# torch.Size([32, 5, 14, 14])
# torch.Size([32, 5, 7, 7])
# torch.Size([32, 1, 4, 4])
# torch.Size([32, 16])
# torch.Size([32, 4])
inp= torch.randn((32,3,160,160))
for ii in range(0,8):
print(enc[:ii](inp).shape)
z = enc(inp)
dropout=0
dec = nn.Sequential(
nn.Linear(latent_size, 16),
UnFlatten(4),
ResBlock(1, 1, 4, act_cls=Mish), #4->5
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 4, 8, act_cls=Mish),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 8, 16, act_cls=Mish),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(16, 3, 3), #1->3
#nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((28, 28)),
nn.Sigmoid()
)
# torch.Size([32, 4])
# torch.Size([32, 16])
# torch.Size([32, 1, 4, 4])
# torch.Size([32, 4, 4, 4])
# torch.Size([32, 4, 8, 8])
# torch.Size([32, 8, 8, 8])
# torch.Size([32, 8, 16, 16])
# torch.Size([32, 16, 16, 16])
# torch.Size([32, 16, 32, 32])
# torch.Size([32, 1, 30, 30])
# torch.Size([32, 1, 28, 28])
for ii in range(0,12):
print(dec[:ii](z).shape)
n_blocks = 5
BASE = im_size//2**5
nfs = [3]+[(2**i)*BASE for i in range(n_blocks)]
n = len(nfs)
hidden_dim = 2048
BASE = im_size//2**5
# encoder
in_dim = nfs[-1] * BASE**2
modules = [ResBlock(1, nfs[i],nfs[i+1],
stride=2, act_cls=Mish) for i in range(n - 1)]
# enc = nn.Sequential(
# ConvLayer(nfs[0],nfs[1],ks=5,stride=2,padding=2),
# ConvLayer(nfs[1],nfs[2],ks=5,stride=2,padding=2),
# ConvLayer(nfs[2],nfs[3],ks=5,stride=2,padding=2),
# ConvLayer(nfs[3],nfs[4],ks=5,stride=2,padding=2),
# ConvLayer(nfs[4],nfs[5],ks=5,stride=2,padding=2),
# Flatten(),
# LinBnDrop(in_dim,hidden_dim,bn=True,p=0.0,act=nn.ReLU(),lin_first=True)
# )
enc = nn.Sequential(*modules,
Flatten(),
LinBnDrop(in_dim,hidden_dim,bn=True,p=0.0,act=nn.ReLU(),lin_first=True)
)
nfs.reverse()
print(nfs)
#last_size = model_sizes(enc, size=(28,28))[-1][1]
encoder = nn.Sequential(enc, nn.Linear(hidden_dim, z_dim))
decoder = nn.Sequential(
nn.Linear(z_dim, im_size*n_blocks*n_blocks),
ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], ks=1, act_cls=nn.ReLU, norm_type=None),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=nn.ReLU, norm_type=None),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(nfs[2], 3, 3, padding=1),
#nn.Dropout2d(dropout),
nn.Sigmoid()
)
#last_size
# nn.Linear(latent_dim,hidden_dim), #nn.Linear(latent_dim, 16)
# nn.Linear(hidden_dim,im_size*n_blocks*n_blocks), #nn.Linear(latent_dim, 16)
# ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
# ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[4], nfs[5], act_cls=Mish), #nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[5], nfs[6], act_cls=Mish), #nn.Conv2d(16, 1, 3),
inp= torch.randn((32,3,160,160))
#encoder[:1](inp).shape
for ii in range(0,10):
print(enc[:ii](inp).shape)
z = encoder(inp)
for ii in range(0,14):
print(decoder[:ii](z).shape)
class UnFlatten(Module):
def __init__(self, size=7):
self.size = size
def forward(self, input):
return input.view(input.size(0), -1, self.size, self.size)
class MMD_VAE(Module):
def __init__(self, latent_size):
self.encoder = nn.Sequential(
ResBlock(1, 1, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 1, stride=2, act_cls=Mish),
Flatten(),
nn.Linear(16, latent_size)
)
dropout=0
self.decoder = nn.Sequential(
nn.Linear(latent_size, 16),
UnFlatten(4),
ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((28, 28)),
nn.Sigmoid()
)
def forward(self, X):
latent = self.encoder(X)
return self.decoder(latent), latent
#decoder
n_blocks = 5
nfs = [3] + [2**i*n_blocks for i in range(n_blocks+1)]
nfs.reverse()
n = len(nfs)
tst = nn.Sequential(
nn.Linear(latent_dim,hidden_dim, #nn.Linear(latent_dim, 16)
nn.Linear(hidden_dim,im_size*n_blocks*n_blocks) #nn.Linear(latent_dim, 16)
UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((3,im_size, im_size)),
SigmoidRange(*out_range)#nn.Sigmoid()
*modules,
ConvLayer(nfs[-2],nfs[-1],
ks=1,padding=0, norm_type=None, #act_cls=nn.Sigmoid) )
act_cls=partial(SigmoidRange, *out_range)))
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 10
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
n_epoch = 40
#learn.unfreeze()
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e6, pct_start=0.05)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"frozen{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
x=1
x
def create_encoder(nfs,ks,conv=nn.Conv2d,bn=nn.BatchNorm2d,act_fn = nn.ReLU):
"""
constructor for generic convolutional encoder
"""
n = len(nfs)
conv_layers = [nn.Sequential(ConvBnRelu(nfs[i],nfs[i+1],kernel_size=ks[i],
conv = conv,bn=bn,act_fn=act_fn, padding = ks[i] //2 ),
Downsample(channels=nfs[i+1],filt_size=3,stride=2))
for i in range(n-1)]
convs = nn.Sequential(*conv_layers)
return convs
def create_encoder_denseblock(n_dense,c_start):
"""
constructor for resnet with dense blocks (?)
n_dense": 3,
"c_start": 4
"""
first_layer = nn.Sequential(ConvBnRelu(3,c_start,kernel_size=3,padding = 1),
ResBlock(c_start),
Downsample(channels=4,filt_size=3,stride=2))
layers = [first_layer] + [
nn.Sequential(
DenseBlock(c_start * (2**c)),
Downsample(channels=c_start * (2**(c+1)),filt_size=3,stride=2)) for c in range(n_dense)
]
model = nn.Sequential(*layers)
return model
def create_decoder(nfs, ks, size, conv=nn.Conv2d, bn=nn.BatchNorm2d, act_fn=nn.ReLU):
"""
CURR VALUES:
"nfs":[66,3*32,3*16,3*8,3*4,3*2,3,1,3],
"ks": [ 3, 1, 3,1,3,1,3,1],
"size": IMG_SIZE
"""
n = len(nfs)
# We add two channels to the first layer to include x and y channels
first_layer = ConvBnRelu(nfs[0], #input size
nfs[1], # output size
conv = PointwiseConv,
bn=bn,
act_fn=act_fn)
conv_layers = [first_layer] + [ConvBnRelu(nfs[i],nfs[i+1],kernel_size=ks[i-1],
padding = ks[i-1] // 2,conv = conv,bn=bn,act_fn=act_fn)
for i in range(1,n - 1)]
dec_convs = nn.Sequential(*conv_layers)
dec = nn.Sequential(SpatialDecoder2D(size),dec_convs)
#SigmoidRange(*y_range)
return dec
def decoder_simple(y_range=OUT_RANGE, n_out=3):
return nn.Sequential(#UpsampleBlock(64),
UpsampleBlock(32),
nn.Conv2d(16, n_out, 1),
SigmoidRange(*y_range)
)